Background

I wrote this notebook as a simple training exercise to better understand feedforward neural networks. The naming conventions in this code match with Andrew Ng's free online course in Machine Learning on Coursera (highly recommended). This neural network has a single hidden layer.

Here's how the neural network is connected and equations for calculating the hypothesis, h_theta(x).

This neural network also implements backpropagation during training to determine the difference between the hypothesis and the training data in order to update the thetas, or weights, in the network.

The example has a trivial training set with X equal to

00
01
10
11

and the y vector used for this supervised learning matches the exclusive or (XOR) pattern.

0
1
1
0

Note: the images above are from Andrew Ng's Machine Learning Course.


In [1]:
# NumPy is the fundamental package for scientific computing with Python.
import numpy as np

The theta_init function is used to initialize the thetas (weights) in the network. It returns a random matrix with values in the range of [-epsilon, epsilon].


In [2]:
def theta_init(in_size, out_size, epsilon = 0.12):
    return np.random.rand(in_size + 1, out_size) * 2 * epsilon - epsilon

This network uses a sigmoid activating function. The sigmoid derivative is used during backpropagation.


In [3]:
def sigmoid(x):
    return np.divide(1.0, (1.0 + np.exp(-x)))
def sigmoid_derivative(x):
    return np.multiply(x, (1.0 - x))

The mean squared error (MSE) provides measure of the distance between the actual value and what is estimated by the neural network.


In [4]:
def mean_squared_error(X):
    return np.power(X, 2).mean(axis=None)

The nn_train function trains an artificial neural network with a single hidden layer. Each column in X is a feature and each row in X is a single training observation. The y value contains the classifications for each observation. For multi-classification problems, y will have more than one column. After training, this function returns the calculated theta values (weights) that can be used for predictions.

The training will end when the desired error or maximum iterations is reached whichever comes first.


In [5]:
def nn_train(X, y, desired_error = 0.001, max_iterations = 100000, hidden_nodes = 5):
    
    m = X.shape[0]
    input_nodes = X.shape[1]
    output_nodes = y.shape[1]
    
    a1 = np.insert(X, 0, 1, axis=1)
    theta1 = theta_init(input_nodes, hidden_nodes)
    theta2 = theta_init(hidden_nodes, output_nodes)
    
    for x in range(0, max_iterations):
        # Feedforward
        a2 = np.insert(sigmoid(a1.dot(theta1)), 0, 1, axis=1)
        a3 = sigmoid(a2.dot(theta2))
        
        # Calculate error using backpropagation
        a3_delta = np.subtract(y, a3)
        mse = mean_squared_error(a3_delta)
        if mse <= desired_error:
            print "Achieved requested MSE %f at iteration %d" % (mse, x)
            break
        a2_error = a3_delta.dot(theta2.T)
        a2_delta = np.multiply(a2_error, sigmoid_derivative(a2))
        
        # Update thetas to reduce the error on the next iteration
        theta2 += np.divide(a2.T.dot(a3_delta), m)
        theta1 += np.delete(np.divide(a1.T.dot(a2_delta), m), 0, 1)
        
    return (theta1, theta2)

The nn_predict function takes the theta values calculated by nn_train to make predictions about the data in X.


In [6]:
def nn_predict(X, theta1, theta2):
    a2 = sigmoid(np.insert(X, 0, 1, axis=1).dot(theta1))
    return sigmoid(np.insert(a2, 0, 1, axis=1).dot(theta2))

Example

We start by plugging our data and classifications into our neural network which returns the weights we can use to make predictions with new data.


In [7]:
X = np.matrix('0 0; 0 1; 1 0; 1 1')
y = np.matrix('0; 1; 1; 0')
(theta1, theta2) = nn_train(X, y)
print "\nTrained weights for calculating the hidden layer from the input layer"
print theta1
print "\nTrained weights for calculating from the hidden layer to the output layer"
print theta2


Achieved requested MSE 0.000996 at iteration 7042

Trained weights for calculating the hidden layer from the input layer
[[-3.48598873 -1.14246648 -3.87862327 -2.97811678 -2.95639015]
 [ 2.18380911  0.91019568  2.44643187  6.64160673  1.85534936]
 [ 2.19573589  0.90093885  2.45157108  6.64224996  1.84479728]]

Trained weights for calculating from the hidden layer to the output layer
[[ -2.58676058]
 [ -4.55786205]
 [ -2.79803737]
 [ -4.99875789]
 [ 10.58611232]
 [ -4.05273821]]

Now that we've trained the neural network. We can make predictions for new data.


In [8]:
# Our test input doesn't match our training input 'X'
X_test = np.matrix('1 1; 0 1; 0 0; 1 0')
y_test = np.matrix('0; 1; 0; 1')
y_calc = nn_predict(X_test, theta1, theta2)
y_diff = np.subtract(y_test, y_calc)
print "The MSE for our test set is %f" % (mean_squared_error(y_diff))
print np.concatenate((y_test, y_calc, y_diff), axis=1)


The MSE for our test set is 0.000996
[[ 0.          0.02914754 -0.02914754]
 [ 1.          0.97206182  0.02793818]
 [ 0.          0.03962189 -0.03962189]
 [ 1.          0.97202469  0.02797531]]

Column one is the correct value, column two is the value predicted by this simple neural network, and the third column shows the difference. The neural network correctly learned the XOR pattern.